376 research outputs found

    Contraction of Locally Differentially Private Mechanisms

    Full text link
    We investigate the contraction properties of locally differentially private mechanisms. More specifically, we derive tight upper bounds on the divergence between PKP\mathsf{K} and QKQ\mathsf{K} output distributions of an ε\varepsilon-LDP mechanism K\mathsf{K} in terms of a divergence between the corresponding input distributions PP and QQ, respectively. Our first main technical result presents a sharp upper bound on the χ2\chi^2-divergence χ2(PKQK)\chi^2(P\mathsf{K}\|Q\mathsf{K}) in terms of χ2(PQ)\chi^2(P\|Q) and ε\varepsilon. We also show that the same result holds for a large family of divergences, including KL-divergence and squared Hellinger distance. The second main technical result gives an upper bound on χ2(PKQK)\chi^2(P\mathsf{K}\|Q\mathsf{K}) in terms of total variation distance TV(P,Q)\mathsf{TV}(P, Q) and ε\varepsilon. We then utilize these bounds to establish locally private versions of the van Trees inequality, Le Cam's, Assouad's, and the mutual information methods, which are powerful tools for bounding minimax estimation risks. These results are shown to lead to better privacy analyses than the state-of-the-arts in several statistical problems such as entropy and discrete distribution estimation, non-parametric density estimation, and hypothesis testing

    Crystal structure of the outer membrane protein OmpU from Vibrio cholerae at 2.2 Å resolution

    Get PDF
    Vibrio cholerae causes a severe disease that kills thousands of people annually. The outer membrane protein OmpU is the most abundant outer membrane protein in V. cholerae, and has been identified as an important virulence factor that is involved in host-cell interaction and recognition, as well as being critical for the survival of the pathogenic V. cholerae in the host body and in harsh environments. The mechanism of these processes is not well understood owing to a lack of the structure of V. cholerae OmpU. Here, the crystal structure of the V. cholerae OmpU trimer is reported to a resolution of 2.2 Å. The protomer forms a 16-β-stranded barrel with a noncanonical N-terminal coil located in the lumen of the barrel that consists of residues Gly32–Ser42 and is observed to participate in forming the second gate in the pore. By mapping the published functional data onto the OmpU structure, the OmpU structure reinforces the notion that the long extracellular loop L4 with a β-hairpin-like motif may be critical for host-cell binding and invasion, while L3, L4 and L8 are crucially implicated in phage recognition by V. cholerae

    Improved Rates for Differentially Private Stochastic Convex Optimization with Heavy-Tailed Data

    Full text link
    We study stochastic convex optimization with heavy-tailed data under the constraint of differential privacy (DP). Most prior work on this problem is restricted to the case where the loss function is Lipschitz. Instead, as introduced by Wang, Xiao, Devadas, and Xu \cite{WangXDX20}, we study general convex loss functions with the assumption that the distribution of gradients has bounded kk-th moments. We provide improved upper bounds on the excess population risk under concentrated DP for convex and strongly convex loss functions. Along the way, we derive new algorithms for private mean estimation of heavy-tailed distributions, under both pure and concentrated DP. Finally, we prove nearly-matching lower bounds for private stochastic convex optimization with strongly convex losses and mean estimation, showing new separations between pure and concentrated DP

    Estimating Smooth GLM in Non-interactive Local Differential Privacy Model with Public Unlabeled Data

    Get PDF
    In this paper, we study the problem of estimating smooth Generalized Linear Models (GLM) in the Non-interactive Local Differential Privacy (NLDP) model. Different from its classical setting, our model allows the server to access some additional public but unlabeled data. By using Stein's lemma and its variants, we first show that there is an (ϵ,δ)(\epsilon, \delta)-NLDP algorithm for GLM (under some mild assumptions), if each data record is i.i.d sampled from some sub-Gaussian distribution with bounded 1\ell_1-norm. Then with high probability, the sample complexity of the public and private data, for the algorithm to achieve an α\alpha estimation error (in \ell_\infty-norm), is O(p2α2)O(p^2\alpha^{-2}) and O(p2α2ϵ2){O}(p^2\alpha^{-2}\epsilon^{-2}), respectively, if α\alpha is not too small ({\em i.e.,} αΩ(1p)\alpha\geq \Omega(\frac{1}{\sqrt{p}})), where pp is the dimensionality of the data. This is a significant improvement over the previously known quasi-polynomial (in α\alpha) or exponential (in pp) complexity of GLM with no public data. Also, our algorithm can answer multiple (at most exp(O(p))\exp(O(p))) GLM queries with the same sample complexities as in the one GLM query case with at least constant probability. We then extend our idea to the non-linear regression problem and show a similar phenomenon for it. Finally, we demonstrate the effectiveness of our algorithms through experiments on both synthetic and real world datasets. To our best knowledge, this is the first paper showing the existence of efficient and effective algorithms for GLM and non-linear regression in the NLDP model with public unlabeled data
    corecore